TC-DWA:Text Clustering with Dual Word-Level Augmentation
نویسندگان
چکیده
The pre-trained language models, e.g., ELMo and BERT, have recently achieved promising performance improvement in a wide range of NLP tasks, because they can output strong contextualized embedded features words. Inspired by their great success, this paper we target at fine-tuning them to effectively handle the text clustering task, i.e., classic fundamental challenge machine learning. Accordingly, propose novel BERT-based method, namely Text Clustering with Dual Word-level Augmentation (TCDWA). To be specific, formulate self-training objective enhance it dual word-level augmentation technique. First, suppose that each contains several most informative words, called anchor supporting full semantics. We use words as augmented data, which are selected ranking norm-based attention weights Second, an expectation form word augmentation, is equivalent generating infinite features, further suggest tractable approximation Taylor expansion for efficient optimization. evaluate effectiveness TCDWA, conduct extensive experiments on benchmark datasets. results demonstrate TCDWA consistently outperforms state-of-the-art baseline methods. Code available: https://github.com/BoCheng-96/TC-DWA.
منابع مشابه
Word clustering effect on vocabulary learning of EFL learners: A case of semantic versus phonological clustering
The aim of this study is to determine the effect of word clustering method on vocabulary learning of Iranian EFL learners through a case of semantic versus phonological clustering. To this effect, 80 homogeneous students from four intermediate classes at an English institute in Torbat e Heydariyeh participated in this research. They were assigned to four groups according to semantic versus phon...
متن کاملDocument Clustering with Dual Supervision
Nowadays, academic researchers maintain a personal library of papers, which they would like to organize based on their needs, e.g., research, projects, or courseware. Clustering techniques are often employed to achieve this goal by grouping the document collection into different topics. Unsupervised clustering does not require any user effort but only produces one universal output with which us...
متن کاملLabor Augmentation with Oxytocin Decreases Glutathione Level
Objective. To compare oxidative stress following spontaneous vaginal delivery with that induced by Oxytocin augmented delivery. Methods. 98 women recruited prior to labor. 57 delivered spontaneously, while 41 received Oxytocin for augmentation of labor. Complicated deliveries and high-risk pregnancies were excluded. Informed consent was documented. Arterial cord blood gases, levels of Hematocri...
متن کاملModeling Word Senses With Fuzzy Clustering
This thesis describes a clustering approach to automatically inferring soft semantic classes and characterizing senses of a set of Norwegian nouns. The words are represented by way of their distribution in text, identified as local contexts in the form of lexical-syntactic relations. Through a shallow processing step the context features are extracted for lemmatized word forms in syntactically ...
متن کاملImproving Document Ranking with Dual Word Embeddings
This paper investigates the popular neural word embedding method Word2vec as a source of evidence in document ranking. In contrast to NLP applications of word2vec, which tend to use only the input embeddings, we retain both the input and the output embeddings, allowing us to calculate a different word similarity that may be more suitable for document ranking. We map the query words into the inp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i6.25868